Improving of Segmental LMR-Mapping Based Voice Conversion Method
نویسندگان
چکیده
Spectral over-smoothing is still observable in the converted spectral envelope when linear multivariate regression (LMR) based spectrum mapping is adopted to convert voice. Therefore, in this paper, we study to place a histogram-equalization (HEQ) module immediately before LMR based mapping and to place a target frame selection (TFS) module immediately after LMR based mapping. These two modules are intended to promote the quality of the converted voice. Here, HEQ processing includes the two steps: (a) transform discrete cepstral coefficients (DCC) into principal component analysis (PCA) coefficients; (b) transform PCA coefficients into cumulated density function (CDF) coefficients. As to TFS, an input frame is first processed to obtain its converted DCC and its segment-class number. Then, the group of target-speaker frames corresponding to the same segment-class number is searched to find a target frame whose DCC are sufficiently close to the converted DCC. Next, the converted DCC are replaced by the DCC of the target frame found. In experimental evaluation, the outside parallel sentences (not used in model-parameter training) are used to measure average cepstral distances (ACD) between the converted DCC and the target DCC. When the HEQ module is added, the value of ACD would be increased a little. Furthermore, the value of ACD would be apparently increased when the TFS module is added. Nevertheless, according to the measured VR (variance ratio) values and the scores of subjective listening tests, the quality of the converted voice will become better when HEQ is added, and become much better when TFS is added. As to the reasons for why the measured ACD values and the perceived converted-voice qualities are inconsistent, we have found one possible cause which can explain why this inconsistency may occur.
منابع مشابه
基於音段式LMR 對映之語音轉換方法的改進 (Improving of Segmental LMR-Mapping Based Voice Conversion Methods) [In Chinese]
把一個來源語者(source speaker)的語音轉換成另一個目標語者(target speaker)的語音,這 種處理稱為語音轉換(voice conversion)[1, 2, 3],語音轉換可應用於銜接語音合成處理, 以獲得多樣性的合成語音音色。去年我們曾嘗試以線性多變量迴歸(linear multivariate regression, LMR)來建構一種頻譜對映(mapping)的機制[4],然後用於作語音轉換,希望 藉以改進傳統上基於高斯混合模型(Gaussian mixture model, GMM)之頻譜對映機制[3] 常遇到的一個問題,就是轉換出的頻譜包絡(spectral envelope)會發生過度平滑(over smoothing)的現象。我們經由實驗發現,音段式(segmental) LMR 頻譜對映機制不僅在平 均轉換誤差上可以比傳統 GMM 頻譜對映機...
متن کاملUsing Context-based Statistical Models to Promote the Quality of Voice Conversion Systems
This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...
متن کاملVoice conversion methods for vocal tract and pitch contour modification
This study proposes two new methods for detailed modeling and transformation of the vocal tract spectrum and the pitch contour. The first method (selective pre-emphasis) relies on band-pass filtering to perform vocal tract transformation. The second method (segmental pitch contour model) focuses on a more detailed modeling of pitch contours. Both methods are utilized in the design of a voice co...
متن کاملVoice Conversion Methods for Vocal Tract
and transformation of the vocal tract spectrum and the pitch contour. The first method (selective pre-emphasis) relies on band-pass filtering to perform vocal tract transformation. The second method (segmental pitch contour model) focuses on a more detailed modeling of pitch contours. Both methods are utilized in the design of a voice conversion algorithm based on codebook mapping. We compare t...
متن کاملA Voice Conversion Method Combining Segmental GMM Mapping with Target Frame Selection
In this paper, a voice conversion approach that combines two distinct ideas is proposed to improve the converted-voice quality. The first idea is to map spectral features, e.g. discrete cepstrum coefficients (DCC), with segmental Gaussian mixture models (GMMs). That is, a single GMM of a large number of mixture components is replaced here with several voice-content specific GMMs each consisting...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJCLCLP
دوره 18 شماره
صفحات -
تاریخ انتشار 2013